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Abstract. The usual step-down and step-up multiple testing proce- 
dures most often lack an important intuitive, practical, and theoretical 
property called the interval property. In short, the interval property 
is simply that for an individual hypothesis, among the several to be 
tested, the acceptance sections of relevant statistics are intervals. Lack 
of the interval property is a serious shortcoming. This shortcoming is 
demonstrated for testing various pairwise comparisons in multinomial 
models, multivariate normal models and in nonparametric models. 

Residual based stepwise multiple testing procedures that do have the 
interval property are offered in all these cases. 

Key words and phrases: All pairwise differences, change point, multi- 
nomial distributions, multivariate normal distributions, rank tests, step- 
down procedure, step-up procedure, stochastic order, treatments versus 
control. 
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1. INTRODUCTION 

Stepwise multiple testing procedures are valuable 
because they are less conservative than standard 
single-step procedures which often rely on Bonfer- 
roni critical values. In other words, they are more 
powerful than their single-step counterparts. In con- 
structing stepwise testing procedures it is common 
to begin with tests for the individual hypotheses 
that are known to have desirable properties. For ex- 
ample, the tests may be UMPU, they may have in- 
variance properties and are likely to be admissible. 
Then a sequential component is added that tells us 
which hypotheses to accept or reject at each step 
and when to stop. We begin with the realization 
that all stepwise procedures induce new tests on the 
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individual testing problems. Carrying out a stepwise 
procedure in a multiple hypothesis testing problem 
is equivalent to applying these induced tests sep- 
arately to the individual hypotheses. Thus, if the 
induced individual tests can be improved, then the 
entire procedure is improved. Due to the sequen- 
tial component, the nature of these induced tests 
is typically complicated and overlooked. Unfortu- 
nately they frequently do not retain all the desirable 
properties that the original tests possessed. 

In this paper we focus on an important type of 
practical property (which in many models is also 
a necessary theoretical property) that we call the 
interval property. This is a desirable property that 
the original tests would typically have but that the 
stepwise induced tests can easily lose. Informally the 
interval property is simply that the resulting test has 
acceptance sections that are intervals. 

To further clarify, suppose one is constructing a test 
for a one-sided hypothesis testing problem. In addi- 
tion to asking for other properties it is sensible to 
examine the acceptance and rejection regions. There 
are often pairs of sample points, X and X* , for which 
there are compelling practical (and sometimes the- 
oretical) reasons for the following to be true. If the 
point X is in the rejection region, then the point X* 
should also be in the rejection region. The practi- 
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Table 1 

Health Status data at sample point x 





Same 


Improved 


Cured 




Placebo 


15 


226 


4 


245 


Dose 1 


4 


226 


15 


245 


Dose 2 


6 


196 


43 


245 



cal desirability of this property is usually due to the 
fact that it is intuitively "clear" that X* is a stronger 
indication of the alternative than is X. In the case 
of two-sided hypotheses there are often triples of 
points, X,X* and X** (on the same line), such that 
if both X and X** are in the acceptance region, then 
one would also want X* to be in the acceptance re- 
gion if in fact X* was not the most indicative of the 
alternative of the three points. 

We illustrate this idea with an example that will 
be treated fully in Section 5.1. Suppose one observes 
the data in Table 1 based on the three labeled in- 
dependent treatments. One of the hypotheses of in- 
terest is whether or not the distribution for Dose 1 
is stochastically larger than that for the placebo. If 
the method used decides in favor of stochastic or- 
der based on observing Table 1, then it should also 
decide in favor of Dose 1 if Table 2 is observed. Re- 
peated use of a test procedure not having this prop- 
erty will ultimately lead to conclusions that seem 
contradictory and would be difficult to justify. The 
interval property is not only natural but is necessary 
for admissibility. We will return to Tables 1 and 2 
later in Section 5.1. 

We study this idea in the most common of multi- 
ple testing situations, that is, those where hypothe- 
ses under consideration involve collections of pair- 
wise differences. The most common of these are 
(i) treatments versus control problems, (ii) change 
point problems and (iii) problems examining all pair- 
wise differences. We will investigate these problems 
in a broad spectrum of models: univariate models 
involving means or variances, multivariate models 
concerning mean vectors, ordinal data models in- 
volving equality of multinomial distributions and 



Table 2 

Health Status data at sample point x* 





Same 


Improved 


Cured 




Placebo 


16 


226 


3 


245 


Dose 1 


3 


226 


16 


245 


Dose 2 


6 


196 


43 


245 



nonparametric models involving equality of distri- 
butions. 

Two popular types of multiple testing procedures 
for such problems are a step-down procedure (to be 
defined later) and a step-up procedure. To simplify 
the presentation we focus mainly on the step-down 
procedure as analogous results can be obtained for 
the FDR controlling step-up procedure of Benjamini 
and Hochberg (1995). We will see that these step- 
down induced tests often do not retain the inter- 
val property. In fact, among all the models consid- 
ered the usual step-down procedure maintains the 
interval property only when testing treatments ver- 
sus control in the one-sided case. We will also show 
how to construct a step-down procedure that does 
have the interval property. Furthermore, it should 
be clear from the examples and from the way that 
the methods are used that this phenomenon exists 
in a far greater variety of models. 

The usual step-down procedure is given in Lehmann 
and Romano (2005). For testing all pairwise compar- 
isons variations are offered in Holm (1979), Shaf- 
fer (1986), Royen (1989) and Westfall and Tobias 
(2007). The lack of the interval property in a one- 
way ANOVA model for testing all pairwise contrasts 
is shown in Cohen, Sackrowitz and Chen (2010) (CSC) 
under a normal model. It has also been demonstrated 
for rank tests in a one-way ANOVA model in Cohen 
and Sackrowitz (2012) (CS). 

Many multiple testing procedures are designed to 
control some error rate such as the familywise er- 
ror rate FWER (weak and strong), the false dis- 
covery rate FDR and k-FWER (see (Lehmann and 
Romano, 2005)). Some researchers also take a finite 
action decision theory problem approach with a va- 
riety of loss functions (e.g., (Genovese and Wasser- 
man, 2002)). In these studies procedures are evalu- 
ated and compared by their risk functions. The risk 
function approach does not always necessitate the 
need to control a particular type of error rate. Du- 
doit and Van der Laan (2008) study expected val- 
ues of functions of numbers of Type I and Type II 
errors. In any particular application one would typ- 
ically have a sense of desirable criteria as well as 
those portions of the parameter space that are most 
relevant. To get a more complete understanding of 
the behavior of one's procedure we recommend that, 
if feasible, error control and risk function properties 
should be examined. 

In this paper we specify procedures that have the 
interval property for a much wider class of both 
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univariate and multivariate models. For exponen- 
tial family models, where individual test statistics 
are dependent, each individual test induced by usual 
step-down and step-up procedures has been shown 
to be inadmissible with respect to the classical hy- 
pothesis testing 0-1 loss. See Cohen and Sackrowitz 
(2005, 2007, 2008) and CSC (2010) cited above. 
Those proofs are based on results of Matthes and 
Truax (1967) that, in effect, say that the interval 
property is equivalent to admissibility. One implica- 
tion of this is that no Bayesian approach would lead 
to a procedure that lacks the interval property. Thus 
no prior distribution can be used to explain a lack 
of the interval property. 

Lack of the interval property not only means that, 
in exponential family models, procedures exist with 
both better size and power for every individual hy- 
pothesis, but it may also lead to very counterintu- 
itive results. It is hard to believe a client would be 
happy with a procedure that could yield a reject of 
a null hypothesis in one instance and then yield an 
accept of the same hypothesis in another instance 
when the evidence and intuition is more intuitively 
compelling in the latter case. 

The methodology we present leads to procedures 
that are admissible. Furthermore, their operating 
characteristics often compare favorably with the 
usual step-down procedures. This behavior can be 
seen from the simulations presented in Cohen, Sack- 
rowitz and Xu (2009) (CSX). In that same paper 
a family of residual based procedures were defined. 
The step-down procedures having the interval prop- 
erty that will be presented in this paper stem from 
those procedures. They are exhibited in special cases 
in CSC (2010) and CS (2012). 

In the models considered here, the Residual based 
Step-Down procedures, labeled RSD, exhibit two 
important characteristics. It begins with the set S = 
{1,2,..., A;} where each integer is associated with 
a population. Next, based on all the data, S is parti- 
tioned into a collection of disjoint sets through a se- 
quential process. Finally, hypothesis Hij (that pop- 
ulation i is equal to population j) is accepted if and 
only if both i and j are in the same set of the final 
partition. Second, the partitioning process is based 
on the pooling of various samples (depending on the 
particular model at hand) at each stage. The final 
partition of the set is reached through a sequence of 
partitions that become finer at each step 

There are some noteworthy differences between 
step-up or step-down and RSD. Depending on the 
collection of hypotheses being tested, there will be 



correlation between many of the test statistics be- 
ing used. Neither step-up nor step-down allows for 
this in the construction of the test statistic itself. 
Thus those test statistics will be the same regardless 
of the correlation structure. The RSD methodology 
yields statistics that are determined by the correla- 
tion structure. Furthermore, the RSD test statistics 
change at each step depending on the actions taken 
at the previous step. 

Unfortunately, insight as to why the interval prop- 
erty will ensue in some cases but not others is still 
wanting. The crucial element seems to be the way 
the test statistics and stopping rules mesh and this 
must be checked mathematically. 

We point out that many of the step-down proce- 
dures discussed here are symmetric in the sense that 
whatever is true for any one hypothesis to be tested 
is also true for the other hypotheses to be tested. So 
although the lack of the interval property is shown 
for one particular testing problem, it is true for all 
individual problems. This takes on added signifi- 
cance for exponential family models. It means that 
every individual test is inadmissible. When the num- 
ber of hypotheses is large, the number of opportuni- 
ties for inconsistent decisions also gets to be large. 
For risk functions that would sum mistakes, such as 
the classification risk ((Genovese and Wasserman, 
2002)), this could amount to considerable error. 

Lastly, we mention the issue of critical values. The 
shortcoming of RSD and to some extent all stepwise 
procedures is in determining sharp critical values. 
This is particularly true in the face of dependence 
which is exactly the situations in which usual step- 
wise procedures tend to lack the interval property. 
With knowledge (based on practicality) of relevant 
criteria and relevant portions of the parameter space 
as focus, one can search for appropriate critical val- 
ues using simulations. A good first simulation for 
RSD is to use the critical values suggested in the 
work of Benjamini and Gavrilov (2009) and modify 
them if necessary. The standard step-up and step- 
down procedures do not take dependency into ac- 
count in choosing a level and can also benefit by us- 
ing simulation to modify their critical values. As ex- 
amples, two simulations are given for a simple model 
in Section 6.3. There we compare RSD and step-up 
in a treatments versus control setting. 

In the next section we give models and defini- 
tions. Several models, for which the results of the 
paper hold, are listed. These include normal mod- 
els, multinomial models, and arbitrary continuous 
distribution models treated nonparametrically. Sec- 
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tion 3 discusses counterexamples to the interval prop- 
erty. In Section 4 we introduce a step-down method, 
called RSD, that leads to procedures that do have 
the interval property. Sections 5, 6 and 7 contain 
results for multinomial models, multivariate normal 
models and nonparametric models, respectively. 

2. MODELS AND DEFINITIONS 

Let 7Tj, i = 1, . . . , k, be k independent populations. 
Data from population TTi is denoted by a q x 1 vec- 
tor Xj and X represents (X' 1; . . . ,X' fc )'. 

Hypotheses of interest, for particular (i, j) combi- 
nations, are denoted by H^ : TTi = TTj versus Kij : m ^ 
■Kj or : 7Tj < ttj . The latter one-sided case can be 
interpreted as the difference in two scalar parame- 
ters in case TTi is characterized by a single parameter 
or < can be interpreted as TTj is stochastically larger 
than 7Tj in case 7r, are multinomial distributions or 
other distributions not necessarily characterized by 
parameters. We consider situations where there are 
at least two connected hypotheses among those to 
be tested, that is, an Hij,Hj m or an Hij,Hi m . We 
study the following three problems in the domain of 
pairwise differences: 

1. All pairwise differences. Here H^ : TTi = TTj versus 

: TTi / iij, all i<j,i,j = l,...,k. 

2. Change point. H^ i+1 ) : TTi = ^i+i versus K^ i+1 ) : 
TTi < TTj+l ,i = 1, ■ ■ ■ ,k — 1, where < can mean sto- 
chastically less than or if TTi is characterized by 
a parameter it simply means that the parameter 
for population i is less than the parameter for 
population i + 1. Two-sided alternatives can also 
be considered. 

3. Treatments versus control. H^ : TTi = versus 
Kij:TTi^TT k ,i = l,...,k-l. 

Problems 1, 2 and 3 will be studied for the follow- 
ing probability models: 

1. -Ki are independent multinomial distributions. For 
problem 2 assume tt\ < TT2 < • • • < tt^ so that the 
alternative hypotheses are strict stochastic order. 

2. -Ki are independent p-variate normal distributions 
with unknown mean vectors /x^ and known co- 
variance matrix S. 

3. Assume TTi has c.d.f. Fi with continuous. For 
problem 2 assume F\ < • • ■ < F^ so alternatives 
are strict stochastic order. 

The intuitive description of the interval property 
given in Section 1 will be given a formal interpre- 
tation on a case by case basis as follows. In each 
specific model, when Hij is being tested, a vector gy 



will be identified based on compelling practical (and/ 
or theoretical) considerations so that a nonrandom- 
ized test <fij(x) will be said to have the interval prop- 
erty (relative to the identified gij ) if tpij (x + agij ) 

(i) is nondecreasing as a function of a in the one- 
sided case, 

(ii) has a convex acceptance region in a in the 
two-sided case. 

These practical considerations turn out to involve 
only the data coming from the populations TTi and TTj 
as they are independent of all the other populations. 
Thus gij will be seen to have entries of for all 
coordinates that do not correspond to data from TTi 
or TTj. Let gij be the 2q x 1 vector consisting of the 
elements of gy that pertain to TTi and TTj . 

Now let Tjj^XjjXj) be the two-population test sta- 
tistic for testing Hij that, when only (xj,Xj) are 
observed, is the basis of the usual step-down proce- 
dure. When all of x is observed we define Tjj(x) = 
Tjj(xj,Xj). That is, Tjj is a function that depends 
on x only through (xj,Xj). 

Also let ijjij((xi,x.j)) be the nonrandomized test 
function which utilizes Tjj (xj , Xj ) . That is, for a one- 
sided test V«i( x i) x i) = 1 if 7ij((xi,Xj)) > C and 
V^ij(xj,Xj) = otherwise. For a two-sided test ^ (xj, 
Xj) = 1 if Tij(xi,Xj) < C L or T^x^Xj) > Cij . Oth- 
erwise ipij(xi,Xj) = 0. 

In the vast majority of multiple testing problems 
the same two-sample test statistic is used for ev- 
ery H^. To simplify notation we will use this setting. 
Extension to the general case would follow easily. 
Thus, when clear, we suppress subscript notation 
for two-sample functions as follows: 

gij = g, fij (xi,Xj) = f(xi,Xj) and 
^ij(xi,Xj) = -0(xi,Xj), sX\i<j. 

We will say ip(xi,Xj) has the interval property rel- 
ative to g in the two-sample problem if ((x^ , x^- )' + 
ag) satisfies (i) and (ii) above. 

At this point we describe the usual step-down pro- 
cedure for multiple testing of a collection of hypothe- 
ses H^ based on statistics T(xj,Xj). See, for exam- 
ple, Cohen, Sackrowitz and Xu (2009). We describe 
the procedure for one-sided alternatives. For two- 
sided alternatives sometimes statistics are absolute 
values or upper and lower critical values are used. 
For one-sided alternatives let K be the number of 
hypotheses to be tested and let < C\ < C2 < • ■ ■ < 
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Ck be critical values. Define the collection of pairs 
Q = ■ Hij is to be tested}. 

Step 1: LetT^jj = max (ij)g Q f (x», x,-)- If ^ < 
Ck, accept all hypotheses and stop. 

If > Ck, reject Hi 1 j 1 and go to step 2. 

Step 2: Consider T i2 j 2 = max( i j) 6 Q\( ilJ - 1 ) T(x;, x,-)- 

If Tj 2i j 2 < Ck-i, accept all remaining hypotheses. If 

Tj 2J - 2 > Ck-i, reject Hi 2 j 2 and go to step 3. 
Step m: Consider 

T(xj,Xj). 



2* 



+■ 



2e 



T 12 =C 1 + S 



max 



(»,j')6Q\{(iiji)-(i 



m— 1 jjm-l 



)} 



If T i m ,j m < CK-(m-i) ; accept all remaining hy- 
potheses. 

If T w' m > C K -( m -i), reject #; mJm and go to 
step (m + 1). 

We remark that the RSD methods presented are 
also based on the function T. However, the argu- 
ments used are not (xj,Xj). 

3. PROTOTYPE COUNTEREXAMPLES TO 
THE INTERVAL PROPERTY 

In this section we describe the fundamentals of 
searching for points at which step-down procedures 
might violate the interval property. The idea is to 
capitalize on a consequence of the sequential pro- 
cess as follows. Suppose, when x is observed, the 
step-down procedure rejects H^ based on the value 
of Tij (x) but does not do so until stage m > 1 . Fur- 
ther suppose that when x* is observed there is even 
more evidence to reject Hij based on Tjy(x*). The 
difficulty is that the stopping rule may prevent the 
procedure from even reaching stage m when x* is 
observed. 

To demonstrate we will consider some multiple 
testing situations using only three populations m, 
TT2, VT3. All the fundamentals can be seen in the 
case that all Xj are one-dimensional and Ta = x 
x,- in the one-sided case and T, 



X; 



Xj\ in the 



two-sided case. Figures 1 and 2 give an intuitive 
sense of the sort of behavior that one seeks for a vi- 
olation of the interval property. To extend these 
ideas to more general situations we use the figures 
to determine the desired relative positions (with dis- 
tances measured by the value of the test statistic) 
of sample points as one moves along the sequence of 
points x,x* and x**. 

Figure 1 is appropriate when the (change point) 
hypotheses to be tested are H\i : 7Ti = 1T2 versus Kyi : 
7i"i < 7T2 and H23 : VT2 = 7T3 versus K23 : ^2 < ^3 • Sup- 



T i3 = C 1 + 5£ 



T 23 =c 2 -e 



Fig. 1. Violation of interval property for one-sided change 
point problem. 

pose gi2 = (—1, 1,0). When x is observed H23 is re- 
jected at stage 1 and then H12 is rejected at stage 2. 
When x* = x + (2e)gi2 is observed H23 is now ac- 
cepted at stage 1, causing the procedure to stop. 
Thus H12 is now accepted despite an increase in ev- 
idence against it. 

Figure 2 is appropriate when the (treatments ver- 
sus control) hypotheses to be tested are H13 : m = 113 



3E 3 _! , 

1 



~ 

T l3 = T 23 = (c 1 + 2C 2 +3e)^ 



2£ 



:2£ 



T 13 =C[ + s 



T 23 =C 2 + £ 



T 13 = C, + 5S 



T 23~" C 2 " £ 



Fig. 2. Violation of interval property for two-sided treat- 
ments versus control problem. 
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versus K13 :tt\ ^ 1T3 and #23 : tt2 = ^3 versus -K23 : 
7T2 7^ 7T3- Here 7r3 is the control and gi3 = (—1,0, 1). 
When x is observed H23 is rejected at stage 1 and 
then H13 is accepted at stage 2. When x* = x + 
((Ci +e)/2)gi3 is observed H23 is rejected at stage 
1 and then #13 is rejected at stage 2. Finally, when 
x** =x* + (2e))gi3 is observed both hypotheses are 
accepted. In the sample space as we go from x to x* 
to x** the evidence against H13 continues to mount. 
Yet the step-down procedure's decisons are to ac- 
cept, reject and then accept again on this sequence 
of points. 

Figure 2 is also appropriate when testing all pair- 
wise comparisons provided C± + 2C2 > 2C3. 

4. RSD FEATURES AND FIRST PROPERTIES 

In this section we describe some specifics of the 
step-down procedures we will present that do have 
the interval property. As previously mentioned, deci- 
sions are, in effect, based on a final partition of the 
set S = {1, 2, . . . , k} that is reached through a se- 
quence of data based partitions that become finer 
at each step. Each integer is associated with a pop- 
ulation. Suppose the hypothesis Hij : tt^ = ttj is un- 
der consideration. Then H^ is rejected if and only 
if i and j are in different sets of the final partition 
of S. The precise rules for the partitioning depend 
on the model and the data. Illustrative examples of 
the process will be given at the end of this section. 
However, certain principles are common to all mod- 
els. 

At the first step the process either stops and S 
itself is the final partition (in this case no hypoth- 
esis can be rejected) or S is divided into two sets. 
At any future step the process either stops or one of 
the sets in the current partition is divided into two 
nonempty sets. The types of allowable sets in the 
partition process are often restricted by the partic- 
ular model being considered. For the process to be- 
gin we must determine three (model-driven) classes 
of sets, and ^2- At any step only sets that 

lie in Q are eligible to be split. Of course SI must 
contain at least two integers. One way the process 
will be stopped is if the current partition contains 
no such sets. Further, if a set BsO is to be divided 
into A and B \ A we require A G f^i and B \ A G ^2- 
It is often the case that fii = r^. Whenever a set, say 
B = {«i,.. . ,i m }, is under consideration to be split 
into two parts the decision is based on some metric 
H(A, B \ A; x) of set dispersion. Here H is defined 
only for A C B with A and B \ A both nonempty. 



For any set of integers, A, define 

n(A) = number of integers in A and 

(4.1) 

Y(A;x) = ]Tx r 

jeA 

Due to the pairwise nature of each Hij the func- 
tions H(A,B\A;x) used in the various multiple 
testing problems will be chosen to depend only on 
the functions n(-) and Y~(-; x). Next let, for any B C £1, 

D(B:x)= max H(A,B\A;x) 

AcB,Acn 1 ,B\AcQ 2 

and let the max be attained for the set Ab- That is, 
D(B;x) = H(A B ,B\A B ;x). If the set B is ever to 
be divided, it will be split into Ab and B \ Ab- The 
dependence of Ab on x will usually be suppressed 
in the notation. 

Let {C m },m = 1, . . . , k be an increasing set of crit- 
ical values. Suppose that for some sample point x 
stage m is reached and the current partition enter- 
ing stage m is denoted by B\ m , . . . , B mm . If 

max(L>(#i m ;x), . . . ,D(B mm ;x)) > C fe+ i_ m 

then split the set corresponding to the largest 
D{B i m ;x) and continue to the next stage. Other- 
wise stop. 

This construction leads to the following two basic 
results. 

Theorem 4.1. Suppose H(A,B\ A; x + ag) has 
the following properties. It is 

(i) a nondecreasing function of a if {i} G A, {j} G 
B\Aor {j}eA,{i}eB\A; 

(ii) constant as a function of a if {i, j} C A or 
{i,j}CB\A; 

(iii) constant as a function of a if {i, j} nB = <p. 

If the final partition at the sample point x places i 
and j in different sets, then the final partition at 
x* = x + ag, a > will also place i and j in different 
sets. 

Proof. Since the final partition at the point x 
placed i and j in different sets, the partitioning pro- 
cess continued, at least, until i and j were separated. 
Consider any stage in which i and j have not yet 
been separated. In that partition let B* denote the 
set containing both i and j. By assumptions (i)- 
(iii), for any B in that same partition we must have 
H(A, B\A;x + ag)= H(A, B\A;x) unless B = B* 
and {i} G A, {j} G B* \ A or {j} G A, {i} G B* \ A. 
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In that case if they are not equal, then [by (i)] we 
must have H ( A, B* \ A; x + ag) > H(A, B* \A;x). 
Thus i and j would become separated at the point 
x + ag at least as early as they were at the point x. 
The result now follows. □ 

Theorem 4.2. Suppose H(A, B \ A; x + ag) has 
the following properties. It is 

(i) nonincreasing and then nondecreasing as 
a function of a if {£} £ A, {j} e B \ A or {j} £ A, 
{i}£B\A; 

(ii) constant as a function of a if {«, j} C A or 
{M}CB\A; 

(hi) constant as a function of a if {2, j} f\B = <j>. 

If the final partition at the sample point x places i 
and j in the same set but the final partition at the 
sample point x* = x + aig, a\ > places % and j 
in different sets, then the final partition at x** = 
x + a2g, a2 > ai will also place £ and j in different 
sets. 

Proof. Since the final partition at the point x 
placed i and j in the same set the partitioning pro- 
cess stopped before £ and j were separated. Consider 
any stage and suppose B* is the set in the partition 
containing both i and j. By assumptions (i)-(iii), for 
any B in that partition we must have H(A, B \A; 
x + a ig) =H(A,B\A; x) unless B = B* and {£} £ A, 
{j} eB*\A or {j} £ A, {£} eB*\A. Since £ and j 
are separated in the final partition at the point x + 
aig we must have, at some stage, H(A,B* \ ^4;x + 
aig) > H(A,B* \ A;x) for some A. It now follows 
from (i) that H(A, B*\A;x + a 2 g) > H(A, B* \ A; 
x + aig) for this A. Hence i and j will be separated 
at the point x + a 2 g at least as early as they were 
at x + aig. □ 

We conclude this section with some examples of 
the partitioning process using simple models. 

Example 4. 1 (Treatments versus control in a nor- 
mal model). Let X{ ~ iV(/ij,l),£ = 1,2,3,4 be in- 
dependent. Let i = 4 represent the control popula- 
tion and £ = 1,2,3 represent the treatment popula- 
tions. The objective is to test : ^ = ^4 versus 
KiA'-m^ A*4,£= 1,2,3. 

To determine an RSD procedure we have opted to 
begin by taking 0, to be the collection of all sets con- 
taining the integer 4 (control) and at least one other 
integer chosen from {1,2,3}. f?i is the collection of 
sets containing exactly one integer from among 1, 
2 and 3. is the collection of sets containing the 
integer 4. As our H(A, B \ A; X) function we will 



use 



(4.2) 



H(A, B \ A; X) 



J^XjHA)- Xj/n(B \ A) 

j€A jeB\A 



/r, 



where r = y/l/n(A) + l/n(B \ A). 

We take our three constants from the Benjamini 
and Gavrilov (2009) critical values by using the nor- 
mal distribution with a = 0.05. That is, C\ = 1.48, 
C 2 = 1.97 and C 3 = 2.40. To fix ideas we will take 
some simple numbers and let X\ = 1,X% = 4, X% = 
-2, X A = 0. 

By our choice of f^i one set must contain only 
one integer and be of the form A = {£}. Thus at 
step 1, the RSD procedure considers the following 
three possible partitions of S: 

(i) A = {l},S\A = {2,3A}, 

(ii) A = {2},S\A = {1,3,4}, 
(hi) A = {3},S\A = {1,2,4}. 

Thus we have n(A) = 1 and n(S \ A) = 3 in all 
three cases. When A = {£} the function H becomes 



H(A,S\A;X) 



X, 



In case (i) 

# = #({1},{2,3,4};X) 

= |1 - (4 - 2 + 0)/3|/ v / 473 = 0.29. 
In case (ii) 

i? = F({2} ) {l ) 3,4};X) 

= |4- (1 - 2 + 0)/3|/ v / 473 = 3.75. 
In case (hi) 

iT = #({3},{l,2,4};X) 

= | - 2-(l + 4 + 0)/3|/ v / 473 = 3.18. 

The largest of these is 3.75 which is greater than 
2.40 = C3. Thus, at step 1, S is split into {2} and 
{1,3,4} and we continue to step 2. Next we con- 
sider splitting B = {1, 3, 4} into two parts where the 
possibilities are 

(iv) A = {1}, and B\A = {3,4}, 

(v) A = {3}, and B\ A = {1,4}. 

Thus we have n(A) = 1 and n(B \ A) = 2 in both 
cases. When A = {£} the function H becomes 



H(A,B\A;X) 



Xi 



/ 3/2. 
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In case (iv) 

if = tf({l},{3,4};X) 

= |1 - (-2 + 0)/2|/v / 372 = 1-63. 

In case (v) 

F = tf({3},{l,4};X) 

= | - 2-(l + 0)/2|/y / 372 = 2.04. 

The largest of these is 2.04 which is greater than 
1.97 = C 2 . Thus, at step 2, {1,3,4} is split into {3} 
and {1,4} and we continue to step 3. At step 3 we 
consider splitting {1,4} into two parts. H is now 
simply 

H = H({1}, {4}; X) = |1 - (0)|/V2 = 0.71. 

Since 0.71 < 1.48 = C\ the set {1,4} remains in- 
tact and the process stops. The final partition is {2}, 
{3} and {1,4}. Recalling that if i and j are placed in 
different sets then Hij will be rejected, we find that 
.ffi4 is accepted, H2A is rejected and ^34 is rejected. 

For each (treatment) i = 1,2,3 in this setting the 
interval property would pertain to the behavior of 
the test as Xi increased and X^ decreased while the 
other (independent variables) remained fixed. Thus 
the vector g would have a —1 in the fourth position, 
a +1 in the ith position and zeroes elsewhere. It 
is not difficult to check that the function H given 
in (4.2) satisfies the conditions of Theorems 1 and 2. 

Example 4.2 (Change point in a normal model). 
Let Xi ~ N(/j,i, = 1, . . . , 10, be independent. The 
objective is to test H^i+i : m = Hi+i versus i^j+i : 
Mi 7^Mt+i)*= 1,...,9. 

To determine an RSD procedure we will begin by 
taking to be the collection of all sets contain- 
ing at least two consecutive integers chosen from 
{1,...,10}. f^i is the collection of sets containing 
consecutive integers chosen from among 1, . . . , 9. 
is the collection of sets containing consecutive inte- 
gers chosen from 2, . . . , 10. As our H(A, B \ A; X) 
function we will again use the function defined in 
Equation (4.2). Now there can be, at most, nine 
steps in the partition process. Again we can use nine 
constants coming from the Benjamini and Gavrilov 
(2009) critical values by using the normal distribu- 
tion with a = 0.05. 

At step 1 the possible partitions are 

A = {l,...,i}, S\A = {i + l,...,W} 

for i = 1, . . . , 9. 

Proceeding as in Example 4.2 we use the H func- 
tion and the constant Cg to decide if and how to di- 



vide S. Suppose it is determined (based on the data) 
to split S into the sets {1, . . . , d} and {d + 1, . . . , 10} 
for some d = 1, . . . , 9. If d = 1, then at step 2 only 
{2, . . . , 10} is eligible to be split while if d = 9, only 
{1, . . . , 9} is eligible. However, if 1 < d < 9, then both 
{1, . . . , d} and {d + 1, . . . , 10} must be considered at 
step 2. At step 2 we consider all divisions of the form 

A = {l,...,i}, B\A = {i + l,...,d} 

for i = 1, . . . ,d — 1 

and 

A = {d + l,...,i}, B\A = {i + l,...,W} 

for i = 1, . . . , 9. 

Now using the H functions and the constant Cs 
we would determine one which, if any, of the above 
sets should be split. We continue in this fashion un- 
til either there are no more sets eligible to be split 
or none satisfy the criterion to be split. As in Ex- 
ample 4.2, if i and i + 1 are placed in different sets 
of the final partition, then -ffj^+i will be rejected. 

5. MULTINOMIAL MODELS 

In this section we assume that there are k inde- 
pendent multinomial populations each with q cells. 
Let 7Tj, i = 1, . . . , k represent the ith population with 
cell probabilities pij,j = l,...,q. 

The individual testing problems are either Hij : 
TTi = TTj versus Kij : 7Tj < irj or Hij : 7Tj = ttj versus 
Ki j : iTi ^ TTj where i < j. In this case tti < irj means 
population j is stochastically larger than popula- 
tion i, that is, YaLiPu > YJ1L1P3I f° r m = l,...,q 
with some strict inequality. 

Let T(xj, Xj) be the two-sample test statistics used 
to test Hij that are to be used in the usual step- 
down multiple testing procedure. A variety of such 
test statistics have been recommended. See, for ex- 
ample, Basso et al. (2009) (BPSS). Most such statis- 
tics, when used to test H^, not as part of a step- 
down multiple testing procedure, have the interval 
property described below. 

In this setting it is natural to consider a test's be- 
havior as xn and xj q both increase while Xi q and Xj\ 
both decrease. Such changes in data would suggest 
to a practitioner an ever-increasing amount of sto- 
chastic order. To be precise, suppose (x,,Xj) is a re- 
ject sample point by virtue of using the two-sample 
test (p. Next, for a > 0, consider any sample point x* 
where 2* a = x a ^ + a for (a, [3) = (i, 1) and (a, j3) = 
(j,q), x* a !/S = x a> p - a for (a, ft) = (j, 1) and (a,/3) = 
(i,q) and x* a = x a ^ otherwise. Then (p has the in- 
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terval property if tp also rejects at (x*,x*). In other 
words (x*,xp is more indicative of stochastic order 
than (xj,Xj). So if (xj,Xj) is a reject point, (x*,x^) 
should also be a reject point. 

Here (p has the interval property relative to the 
2q x 1 vector g with 1 in positions 1 and 2q, —1 in 
positions q and q + 1 and elsewhere. Thus for the 
multiple testing problem the kq x 1 vector gy has 
the value +1 in positions (i — l)(q) + 1 and (J)(q), 
the value —1 in positions (i)(q) and (j — l)(q) + 1 
and the value in all other positions. 

It can be verified that all linear statistics and 
most nonlinear statistics listed in BPSS (2009), Sec- 
tion 2.2 have this interval property. However, these 
same statistics used as part of a step-down multiple 
testing procedure will often lead to induced tests 
that fail to have the interval property. 

5.1 Change Point 

In the one-sided change point problem the hy- 
potheses are f/j^+i : vr^ = 7Tj+i versus K^i+i : vrj < 7Tj + i , 
i = 1, . . . , k — 1. That is, in the above j = i + 1. At 
this point we will demonstrate a simple search that 
would often lead to the result that the usual step- 
down procedure for testing i?i 2 , for example, will 
not have the interval property. That is, if p\2 de- 
notes the induced test of H\2 for the usual step-down 
procedure, ip\2 will not have the interval property 
relative to gi2 • The only impediment to this type of 
search is the fact that the data consists of integers 
in each cell and if sample sizes are small this could 
be problematic. An example will follow the recipe. 

We follow the pattern exhibited in Figure 1 while 
allowing for the presence of additional hypotheses 
(i.e., k can be greater than 3). Recall that Tj( i+1 )(x) 
depends only on (xj, Xj+i). Begin by choosing a sam- 
ple point x = (x' l5 x' 2 , . . . , x' fc )' so that Tj( i+1 )(x) > 
Ct,i = 3, . . . , k - l;T 12 (x) = d + £i,T 23 (x) = C 2 + 
£2, £i > 0,£ 2 > 0. At x, all hypotheses are rejected 
by step-down. Next consider points x* of the form 
x* = x + ag. That is, x* = (xf, . . . , xj£')' where x* = 
Xj for % = 3, . . . , k but x* x = x\\ + a, x\ - = x±j , j = 
2,..., q-1, xl q = x lg -a, x* 21 = x 2 i - a, x* 2j = x 2 j , 
j = 2,...,q- 1, x* 2q = x 2q + a. 

We note that for most of the statistics used in 
BPSS (2009) T\2 is an increasing function of a, T23 
is a decreasing function of a and for i > 3 does 

not change with a. Choose a > so that T 2 3(x*) < 
C 2 and Ci + £1 < 2i 2 (x*) < C 2 . Hence at x* the 
step-down procedure would reject H^j+i for i > 3, 
but H12 and H23 would be accepted. Thus the usual 



step-down procedure does not have the interval prop- 
erty in this case. 

Example 5.1. Consider three independent multi- 
nomial distributions, each with three cells. Test H12 ■ 
7Ti = 7T2 versus K\2 : 7Ti < 7T2 and H23 ■ ^2 = ^3 versus 
K23 : 7r 2 < 7r 3 . Use Wilcoxon-Mann-Whitney (WMW) 
test statistics W^ i+ i^ using midranks. See BPSS (2009) 
The statistics are then normalized by letting Z^ i+ i^ = 
[W i{ i +l) - m(m + n + l)/2]/ v / mn(m + n + 1)/12, 
where m and n are the row totals of a two-row table. 

For the usual step-down procedure choose con- 
stants C\ = 1.645 and C2 = 1.96. The data in Ta- 
ble 1 offers sample point x. 

The statistics are Zi 2 (x) = 1.653 and Z2s(x) = 
2.006 leading to rejection of H23 followed by rejec- 
tion of if 12 • Now we simply choose a = 1 to get the 
sample point x* corresponding to Table 2. For x*, 
Zi 2 (x*) = 1.954 and Z 2 3(x*) = 1.865. The usual step- 
down procedure now accepts both hypotheses at x*. 
Thus the usual step-down procedure with induced 
test P12 for H12 does not have the interval property 
relative to gi 2 where g has a 1 in positions 1 and 6, 
a —1 in positions 3 and 4 and elsewhere. 

Next we introduce another procedure based on the 
RSD method that does have the interval property. 
Informally, the RSD approach will, at each stage, 
consider collections of 2 x q tables formed by col- 
lapsing sets of consecutive rows. It will then apply 
a two-sample test having the interval property to 
these adaptively formed 2 x q tables. In order to 
make this precise we need only define the function H 
and the sets 0,f2i and 2 . First we take $7 to be 
the collection of sets containing at least two con- 
secutive integers and take f2i = f2 2 to be the collec- 
tion of all sets of consecutive integers chosen from 
S = {1, 2, . . . , k}. Then for any T having the interval 
property relative to g let 

H(A,B\ A;x) = f(Y(A),Y(B \ A)), 

where Y is as defined in Equation (4.1). 

Now we use the current choice of g along with the 
definitions of Y and H as well as the fact that T has 
the interval property relative to g. This allows us to 
verify that assumptions (i)-(iii) of Theorem 4.1 are 
satisfied. Thus we have 

Theorem 5.1. RSD has the interval property. 

To demonstrate the use of the RSD methodology 
here we apply it to the model of Example 5.1. 
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Table 3 

Data of Table 1 with first two rows combined 



Same Improved 


Cured 




(Placebo + Dose l)/2 9.5 226 
Dose 2 6 196 


9.5 

43 


245 
245 


Table 4 

Data of Table 1 with second two rows 


combined 




Same Improved 


Cured 




Placebo 15 226 
(Dose 1 + Dose 2)/2 5 211 


4 
29 


245 
245 



Example 5.1 (Continued). RSD for the data in 
Table 1, which represents sample point x, is carried 
out as follows: First Tables 3 and 4 are formed from 
Table 1 by averaging frequencies in rows 1 and 2 for 
Table 3 and averaging rows 2 and 3 for Table 4. 

At step 1, WMW test statistics VFi 2 ,3(x) and 
Wi,23 (x) are calculated using midranks and then 
converted to normalized statistics Zi2,3(x) and 
■Zi,23(x). We calculate Zi2,3(x) = 2.78 and /^^(x) = 
2.603. Using critical values C\ = 1.645 and C2 = 1.96 
we reject £^23 at step 1 based on Zi2,3(x). At step 2 
we test H\2 by using W\2 (x) normalized to Z\2 (x) = 
1.653 and thereby reject H12 as well. The sample 
point x* is represented by the data in Table 2. Pro- 
ceeding as above we calculate Zi2,3(x*) =2.78 and 
^1,23 ( x *) = 2.824. This leads to rejection of H23. 
Next calculate Zi2(x*) = 1.946 which leads to re- 
jection of H\2- 

5.2 Treatments versus Control 

Let ir k be the control population. The hypotheses 
are H ik ■.■K i = ir k versus K ik 7^ TT k ,i = 1, . . . , k - 1. 
Let T(xj,Xfc) be the two-sample test statistics used 
for testing Hi k that are to be used in the usual step- 
down testing procedure. A wide variety of such tests 
are listed in BPSS (2009) . When we focus on just one 
hypothesis testing problem we are again comparing 
just two populations. Therefore the natural g is the 
same as that defined in the beginning of this section. 
That is, the two-sample interval property is relative 
to the 2q x 1 vector g with 1 in positions 1 and 2q, 
— 1 in positions q and q + 1 and elsewhere. For the 
multiple testing problem the kq x 1 vector gi k has 
the value +1 in positions (i — l)(q) + 1 and (k)(q), 
the value —1 in positions (i)(q) and (k — l)(q) + 1 
and the value in all other positions. 

To show that the usual step-down procedure does 
not have the interval property we follow the pattern 



exhibited in Figure 2 while allowing for the pres- 
ence of additional hypotheses (i.e., k can be greater 
than 3). Again the discreteness could create a prob- 
lem with small sample sizes. Recall that Tj&(x) de- 
pends only on (xj,x/%). 

Choose a sample point x so that xi and x^ are 
the same, Xj, i = 3, . . . , k — 1 are such that Ti k (x) 
exceeds Ci by a substantial amount, X2 is such that 
^2fc( x ) > C\ + C2. Thus at x,H 2k is accepted. Now 
choose x* so that C\ < Ti k (x*) < C2, and T2 k (x*) = 
C2 + b. This is possible since T\ k has the interval 
property and since x^ is closer to x£ than X2 is 
to Xfc. Now at x* the procedure rejects H\ k and H2 k . 
Finally choose x** so that T 2k (x**) < C2 and 
Tifc(x**) < C2. This is possible since x** is such 
that x" and x" are moving further apart while Xg* 
and x£* are moving closer to each other. Thus at x^*, 
H\ k and H2 k are accepted. This demonstrates that 
the usual step-down procedure lacks the interval 
property relative to g. 

Now we indicate the RSD method that does have 
the interval property. Informally, the RSD approach 
will, at each stage, consider collections of 2 x q tables 
formed by taking one row to be one of the treatments 
while the other row is the result of combining all 
other treatments with the control. It will then apply 
a two-sample test having the interval property to 
these adaptively formed 2 x q tables. In order to 
make this precise we need only define the function H 
and the sets 0,f2i and O2. First we take £1 to be 
the collection of all sets containing k and at least 
one other integer chosen from {1,2,..., A; — 1}. Oi is 
the collection of sets containing exactly one integer. 
O2 is the collection of sets containing the integer k. 
Then for any T having the interval property relative 
to g let 

H(A,B\A;x)=T(Y(A),Y(B\A)). 

Now we use the current choice of g along with the 
definitions of Y and H as well as the fact that T has 
the interval property relative to g. This allows us to 
verify that assumptions (i)-(iii) of Theorem 4.2 are 
satisfied. Thus we have 

Theorem 5.2. RSD has the interval property. 

5.3 All Pairwise Differences 

The hypotheses are Hij : Hi = ttj versus : iti / 
iTj, i = 1, . . . , k — l,j = i + 1, . . . , k. Once again it can 
be shown that the usual step-down procedure does 
not have the interval property in this case. Focus- 
ing on H\2 and utilizing statistics X12 and T23 as in 
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the arguments of Section 5.1 will suffice to give the 
results in this case. 

We now offer an RSD procedure that does have 
the interval property. The basis of this RSD proce- 
dure is the PADD procedure for testing all pairwise 
normal means in CSC (2010). For the multinomial 
case we describe the procedure now. 

Again it suffices to follow the exposition in Sec- 
tion 3. Here we let Q be the collection of all sets 
containing at least two integers. Further let = fi 2 
be the collection of all nonempty subsets of S = 
{1,2,..., A;}. Next take 

H(A, B\A;x) = T(Y(A; x), Y (B \ A; x)), 

where T is any test statistic for testing independence 
in a 2 x q table that has the interval property relative 
to g. 

The interpretation is as follows: By definition ev- 
ery Y(A;x) will be the result of combining all rows 
corresponding to indices in A. In determining how 
a set B might be split we look at every possible way 
to collapse all the rows corresponding to the indices 
in B into just two rows. Then a test is performed 
for each resulting 2x5 table. For example, if k = 4 
and B = {1,2,3,4}, then the possible splits are {1} 
and {2,3,4}, {2} and {1,3,4}, {3} and {1,2,4}, {4} 
and {1,2,3}, {1,2} and {3,4}, {1,3} and {2,4} or 
{1,4} and {2,3}. 

With these definitions one can check that assump- 
tions (i)-(iii) of Theorem 4.2 are satisfied. Thus we 
have 

Theorem 5.3. RSD has the interval property. 

6. MULTIVARIATE NORMAL MODELS 

Let Xj, i = 1, . . . , k, be independent g-variate nor- 
mal random vectors with mean vectors //j and known 
nonsingular covariance matrix S. All hypotheses are 
concerned with pairwise differences between mean 
vectors. In light of this we assume without loss of 
generality that E = I. The two-sample test statistic 
that will serve as the basis for all usual step-down 
procedures considered to test Hij : fi { = fj,j versus 
Kij : ^ / fij is 

(6.1) f(xi,Xj) = (xj -Xj)'(xj - xj)/2 

which has a chi-squared distribution with q degrees 
of freedom. 

Here a natural form of the interval property is 
along points 

(6.2) x=(xi,4,...,4)', 



(6.3) x* = ((xi - niy, (x 2 + ril)',x 3 , . . . ,x' fe )', 

(6.4) x** = (( Xl - r 2 l)', (x 2 + r 2 l)',x 3 , . . . ,x' fc )', 

where < r\ < r 2 and 1 is a vector of all l's. Thus 
g = (—1, . . . , —1, 1, . . . , 1)' and g has entries of —1 for 
coordinates corresponding to population i, 1 for co- 
ordinates corresponding to population j and else- 
where. 

6.1 All Pairwise Differences 

The case of q = 1 has been studied by CSC (2010). 
For arbitrary q, the lack of the interval property of 
the usual step-down procedure is shown by focus- 
ing on H12 and utilizing statistics Ti 2 ,T 2 3 as in the 
argument of Section 5.1. 

At this point we describe an RSD which does have 
the interval property. Here we let £1 be the collection 
of all sets containing at least two integers. Further 
let fix = f2 2 be the collection of all nonempty subsets 
of S = {1, 2, ... , k}. Next take 

H(A,B\A;x) 

= f(Y(A; x)/n(A),Y(B \ A; x)/n(B \ A)) 

/(l/n(A) + l/n(B\A)). 

Again the assumptions of Theorem 4.2 can be ver- 
ified and the interval property established. 

6.2 Change Point 

The hypotheses are H^ i+1 ^ : /ij = /Lt i+1 versus 
: Mi Mi+i) * = 1, 2, . . . , A; — 1. Test statistics 
for the usual step-down procedure are T(xj, Xj + i) as 
given in (6.1). The lack of the interval property for 
the usual step-down is shown by focusing on H12 
and utilizing statistics Ti 2 and T 2 3 as in the argu- 
ment of Section 5.1. Here again we let x,x*,x** be 
as in (6.2), (6.3) and (6.4). 

For RSD we proceed as follows: Take Q to be 
the collection of sets containing at least two con- 
secutive integers and take Q\ = 2 to be the collec- 
tion of all sets of consecutive integers chosen from 
S = {1, 2, . . . , k} and again choose 

H(A,B\A;x) 

= f(Y(A; x)/n(A),Y(B \ A; x)/n{B \ A)) 

/(l/n(A) + l/n(B\A)). 

Once again the assumptions of Theorem 4.2 can 
be verified and so RSD has the interval property in 
this case. 
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Remark 6.1. For the univariate normal change 
point problem, MRD is a special case of an RSD 
procedure. For a numerical simulation study com- 
paring MRD with step-down see Cohen, Sackrowitz 
and Xu (2009). 

6.3 Treatments versus Control 

The case q = 1 is treated in CSX (2009) and the 
case of arbitrary q was treated in Cohen, Sackrowitz 
and Xu (2008) (CSX). 

The hypotheses are : fj, i = fi k versus Kn- : fa ^ 
fa., i = 1, 2, . . . , k — 1. The usual step-down two- 
sample statistics at step 1 are = (xj — Xfc)'(xj — 
Xfc)/2. To determine the RSD procedure we take Q 
to be the collection of all sets containing the inte- 
ger k and at least one other integer chosen from 
{1, 2, . . . , k — 1}. is the collection of sets contain- 
ing exactly one integer from among {1, . . . , k — 1}. 
f^2 is the collection of sets containing the integer k. 
As in Sections 6.1 and 6.2 let 

H(A,B\A;x) 

(6.5) = f(Y(A; x)/n(A),Y(B \ A; x)/n(B \ A)) 

/(l/n(A) + l/n(B\A)). 

The RSD we use in this situation is simply the vec- 
tor analog to the procedure shown in Example 4.1. 
Now, of course, q > 1, scalar variables and parame- 



ters become vectors and the number of treatments 
is k — 1. For the function H we use the vector ana- 
log to (4.2) that is given in (6.5). Implementation 
follows the same steps as in Example 4.1. The only 
difference might be in the choice of constants as dis- 
cussed below. 

Here again it can be shown that the usual step- 
down test of does not have the interval property 
when g = (0, . . . , 0, — 1, 0, . . . , 0, 1) with the —1 in 
the ith position while RSD does have the interval 
property. 

We now give two simple examples of how the RSD 
method might be constructed and used. First we 
mention that for the standard step-up procedure 
the Benjamini and Hochberg (1995) constants in the 
two-sided case are given by 

(6.6) Cf n = $>~ l {l-{k + l-i){a/2)/k). 

The constants given in Benjamini and Gavrilov 
(2009) are 

(6.7) Cf G = $- x (l - i(a/2)/(fc + 1 - i{l - a/2)). 

Take q = 1 and k = 101 so we have 100 treatments 
and one control. Suppose further that the only rea- 
sonable scenario is that the number of truly signif- 
icant treatments is sparse, say, at the very most, 
15% of the treatments. Table 5 gives the results of 
a simulation using 5000 iterations at each parame- 



Table 5 

Performance of RSD and SU. The mean of the control population is 0.0. Each mean value listed represents five treatments. 

All unspecified means are equal to 0.0 

Expected number of errors 



Means for treatment number Type I Type II Total FDR 

15 6-10 11-15 RSD SU RSD SU RSD SU RSD SU 



0.00 


0.00 


0.00 


0.1 


0.7 


0.0 


0.0 


0.1 


0.7 


0.048 


0.045 


0.00 


0.00 


-2.00 


0.1 


0.7 


3.5 


4.4 


3.6 


5.1 


0.046 


0.050 


0.00 


0.00 


-4.00 


0.3 


0.8 


0.0 


0.8 


0.4 


1.6 


0.051 


0.054 


0.00 


2.00 


-2.00 


0.3 


0.7 


6.0 


8.8 


6.2 


9.5 


0.045 


0.044 


0.00 


2.00 


2.00 


0.2 


0.8 


6.8 


8.5 


7.0 


9.2 


0.048 


0.044 


0.00 


2.00 


-4.00 


0.4 


1.0 


2.7 


4.6 


3.1 


5.6 


0.049 


0.054 


0.00 


2.00 


4.00 


0.4 


0.8 


2.7 


4.8 


3.2 


5.6 


0.048 


0.048 


0.00 


4.00 


-4.00 


0.6 


0.9 


0.0 


1.0 


0.6 


1.9 


0.050 


0.052 


0.00 


4.00 


4.00 


0.0 


0.9 


0.0 


1.1 


0.6 


2.0 


0.049 


0.050 


2.00 


2.00 


-2.00 


0.4 


0.9 


8.1 


12.8 


8.5 


13.7 


0.045 


0.048 


2.00 


2.00 


2.00 


0.4 


0.9 


10.0 


12.3 


10.3 


13.2 


0.055 


0.045 


2.00 


2.00 


-4.00 


0.6 


0.9 


5.3 


8.2 


5.9 


9.2 


0.051 


0.048 


2.00 


2.00 


4.00 


0.6 


0.9 


5.3 


8.6 


5.9 


9.4 


0.034 


0.047 


2.00 


4.00 


-4.00 


0.7 


1.1 


2.3 


4.6 


3.0 


5.7 


0.049 


0.052 


2.00 


4.00 


4.00 


0.7 


1.0 


2.3 


4.7 


3.0 


5.7 


0.049 


0.049 


4.00 


4.00 


-4.00 


0.8 


1.2 


0.0 


1.1 


0.8 


2.3 


0.048 


0.050 


4.00 


4.00 


4.00 


0.8 


1.3 


0.0 


1.3 


0.9 


2.6 


0.050 


0.055 
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Table 6 

Performance of RSD and SU. The mean of the control population is 0.0. Each mean value listed represents eight treatments. 

All unspecified means are equal to 0.0 



Expected number of errors 



Means for treatment number Type I Type II Total FDR 



1-8 


9 16 


17 24 


RSD 


SU 


RSD 


SU 


RSD 


SU 


RSD 


SU 


n nn 

u.uu 


n nn 


n nn 

u.uu 


n n 

u . u 


n i 


n n 


n n 

u . u 


n n 

u. u 


n ■■> 

U . ') 


n n^i 


n n^s 


n nn 

u.uu 


n nn 

u.uu 


—9 nn 


n i 

U. -L 


n 7 


fi 1 


U . o 


(i 9 

u . — 


7 fi 


n n9Q 


n ndfi 

u.u*±u 


0.00 


0.00 


-4.00 


0.3 


0.8 


0.0 


0.9 


0.3 


1.8 


0.031 


0.051 


0.00 


2.00 


-2.00 


0.2 


0.8 


9.6 


13.7 


9.8 


14.5 


0.027 


0.043 


0.00 


2.00 


2.00 


0.2 


0.8 


12.1 


13.0 


12.3 


13.8 


0.037 


0.043 


0.00 


2.00 


-4.00 


0.4 


1.1 


4.5 


6.7 


4.9 


7.8 


0.030 


0.051 


0.00 


2.00 


4.00 


0.4 


1.0 


4.6 


6.9 


5.0 


7.9 


0.029 


0.048 


0.00 


4.00 


-4.00 


0.5 


1.4 


0.0 


1.2 


0.6 


2.6 


0.030 


0.056 


0.00 


4.00 


4.00 


0.5 


1.3 


0.0 


1.3 


0.6 


2.6 


0.030 


0.053 


2.00 


2.00 


-2.00 


0.3 


1.0 


13.3 


19.6 


13.6 


20.6 


0.028 


0.045 


2.00 


2.00 


2.00 


0.3 


1.0 


19.2 


18.8 


19.5 


19.7 


0.058 


0.040 


2.00 


2.00 


-4.00 


0.5 


1.0 


9.4 


12.1 


9.9 


13.1 


0.034 


0.045 


2.00 


2.00 


4.00 


0.5 


1.0 


9.4 


12.5 


10.0 


13.5 


0.034 


0.045 


2.00 


4.00 


-4.00 


0.6 


1.3 


3.8 


6.5 


4.5 


7.8 


0.030 


0.048 


2.00 


4.00 


4.00 


0.6 


1.2 


3.8 


6.7 


4.5 


7.9 


0.029 


0.046 


4.00 


4.00 


-4.00 


0.8 


1.4 


0.0 


1.3 


0.8 


2.7 


0.030 


0.047 


4.00 


4.00 


4.00 


0.8 


1.6 


0.0 


1.4 


0.8 


3.0 


0.030 


0.052 



ter point. We compare the RSD method with step- 
up on the criteria of FDR, the expected number of 
Type I errors and the expected number of Type II er- 
rors. For RSD we were able to use the critical values 
of (6.7) with a = 0.05 without any modification. For 
step-up, on the other hand, using a = 0.05 in (6.6) 
resulted in a procedure that was (due to the depen- 
dence) too conservative and put it at a disadvan- 
tage. Instead we found, using simulation, that tak- 
ing a = 0.07 in (6.6) gave a better performing proce- 
dure for this covariance structure. For this applica- 
tion RSD has the interval property, is comparable to 
step-up relative to FDR and makes fewer mistakes 
than step-up. Table 6 allows for a less sparse sit- 
uation allowing as many as 24% better treatments. 
Here simulation indicated that we should again take 
a = 0.07 in (6.6) for step-up and the critical values 
of RSD should correspond to a = 0.03 in (6.7). 

In both Tables 5 and 6 the mean of the control 
population is taken to be 0.0. In Table 5 the means 
given in the first three columns each represent five 
treatment means. The other 85 treatment means are 
0.0. For example, in the next to last row, the first 
10 treatment means would be 4.00 and the next five 
treatment means would be —4.00. In this case 15% 
of the treatments would be nonzero. In Table 6 the 
means given in the first three columns each represent 
eight treatment means. Thus the maximum number 



of nonzero treatment means would be, at most, 24%. 
Note both Tables 5 and 6 indicate fewer errors for 
RSD for all parameter points considered. 

Remark 6.2. For the univariate normal treat- 
ments versus control problem MRD is a special case 
and natural choice of an RSD procedure. One of 
the simulation studies in Cohen, Sackrowitz and Xu 
(2009) was done for this same model but for many 
more treatments. Both step-up and step-down were 
considered. As described in that paper it was more 
difficult to arrive at appropriate choices for critical 
values. The nature of the results was the same but, 
due to the large number of populations, the results 
were stronger. 

7. NON PARAMETRIC MODELS 

Nonparametric multiple testing is discussed in 
Hochberg and Tamhane (1987). Here we begin with n 
independent observations from each of k indepen- 
dent populations F±, . . . ,Ff~- The collection of all nk 
observations are ranked and we let R{ = the aver- 
age of the ranks for the observations coming from 
population i. Also let R = (R\, . . . , Rk)' ■ For testing 
Hij : Fi = Fj versus Kij : Fi < Fj or Hj, : Fi = Fj ver- 
sus Kij : F{ ^ Fj based on R it is natural to study 
the behavior of testing procedures as Ri decreases 
and Rj increases. 
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This model fits our original setting with Ri play- 
ing the role of x, and q = l. Here g = ( — 1, 1)' and g 
is the k x 1 vector with —1 as the ith coordinate, 1 
as the jth coordinate and elsewhere. 

7.1 All Pairwise Differences 

The problem of nonparametric multiple testing of 
all pairwise comparisons of distributions has been 
treated by Cohen and Sackrowitz (2012) (CS). There 
it is shown that the step-down procedure of Camp- 
bell and Skillings (1985) based on ranks lacks an 
interval property. It is also shown in CS (2010) that 
the RSD procedure (called RPADD there) does have 
the interval property. 

7.2 Change Point 

Next we consider testing H^^u : Fi = Fi + \ ver- 
sus : Fi < Fi + i ,i = l,...,k — l assuming Fi < 
F 2 < ■ ■ ■ < Fk . Assume sample sizes are n for each 
population. It is possible to show that a typical step- 
down procedure using two-sample rank tests (based 
on separate ranks or joint ranks) for would 
not have the interval property. However, the RSD 
procedure which we now describe will have the in- 
terval property. As in the other change point set- 
tings, take fi to be the collection of sets containing 
at least two consecutive integers and take S7i = SI2 
to be the collection of all sets of consecutive integers 
chosen from S = {1, 2, ... , k}. Here we let 

H(A,B\A;K) 
= (Y(A;K)/N(A) 

-Y(B\A;K)/N(B\A))/a AiB , 

where 

a\ B = w(l/N(A) + l/N(B\A))/12 and 

w = k(kn + 1). 

With these definitions it is easy to verify the condi- 
tions of Theorem 4.1 to obtain 

Theorem 7.1. RSD has the interval property 
for testing iJ^j+i- 

7.3 Treatments versus Control 

For testing treatments versus control the hypothe- 
ses are : F{ = versus : Fi ^ Fk- Now con- 
sider the usual step-down procedure which is based 
on the two-population statistic 

Tik = \Ri — Rk\/&{i},{k} 
in comparing the ith treatment with the control. It 
can be shown that the usual step-down procedure 
does not have the interval property for testing H^. 



On the other hand, it can be shown that the RSD 
procedure for this model does have the interval prop- 
erty for testing Hn-. RSD in this case is defined as 
follows: Let fi be the collection of all sets contain- 
ing k and at least one other integer chosen from 
S = {1, 2, . . . , k — 1}. fii is the collection of sets con- 
taining exactly one integer. fi2 is the collection of 
sets containing the integer k. Then take 

H(A,B\ A;R) 

= \Y(A;K/N(A)) 

-Y(B\A;K)/N(B\A)\/a A , B , 

where a\ B is as defined in Section 7.2 above. With 
these definitions it is easy to verify the conditions of 
Theorem 4.1 to obtain 

Theorem 7.2. RSD has the interval property 
for testing iJ^fe. 
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